Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA
نویسندگان
چکیده
For video coding, weighing the balance between and coding rate image quality, we apply global motion search algorithm to avoid loss of image quality and parallel computing capacity of graphics processors to accelerate the encoding process. According to the heterogeneous system of CPU+GPU, and the multi-threaded parallel structure, thread synchronization features of CUDA platform, we build a proper global motion search on CUDA computing model; taking CUDA thread synchronization mechanism to solve the problem of data consistency and improve the efficiency of on-chip data communication; taking CUDA asynchronous mechanism to hide the delay caused by the CPU functions. Demonstrated by the experimental results, parallel computing model based on CUDA could significantly improve the efficiency of motion estimation algorithm and a certain improvement gains from the asynchronous parallel model based on CUDA asynchronous system.
منابع مشابه
Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with \textsf{CUDA}
In this note, we present the stability as well as performance analysis of asynchronous parallel computing algorithm implemented in 1D heat equation with CUDA. The primary objective of this note lies in dissemination of asynchronous parallel computing algorithm by providing CUDA code for fast and easy implementation. We show that the simulations carried out on nVIDIA GPU device with asynchronous...
متن کاملParallel Motion Estimation Implementation for Different Block Matching Algorithms onto Gpgpu
This work presents an efficient method to map Motion Estimation (ME) algorithms onto General Purpose Graphic Processing Unit (GPGPU) architectures using CUDA programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelization potential of ME algorithms: Full Search (FS) and Diamond Search (DS). Our main goal is to evaluate the feasib...
متن کاملParallelized Block Match Algorithm on Multi-core Processors
In order to increase the coding efficiency of H.264/AVC, this paper proposed a parallelized approach of full search (FS) algorithm for motion estimation on Graphic Processor Unit (GPU) using computing unified device architecture (CUDA). By utilizing the independence among different MBs, we are able to take the full advantage of the computational power of CUDA and speed up the FS motion estimati...
متن کاملMultiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture
This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a video encoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture, and to compare the performance with a theoretical model and two implementations (sequential and parallel using OpenMP library). We obtained a O(n2/log2n) speed-up which fits t...
متن کاملMultiprocessing GPU Acceleration of H.264/AVC Motion Estimation under CUDA Architecture
Abstract— This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a video encoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture, and to compare the performance with a theoretical model and two implementations (sequential and parallel using OpenMP library). We obtained a O(n2/log2n) speed-up wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCP
دوره 7 شماره
صفحات -
تاریخ انتشار 2012